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Abstract 

Shannon mutual information provides a measure of how much infor- 
mation is, on average, contained in a set of neural activities about a set of 
stimuli. It has been extensively used to study neural coding in different 
brain areas. To apply a similar approach to investigate single stimulus 
encoding, we need to introduce a quantity specific for a single stimulus. 
This quantity has been defined in literature by four different measures, but 
none of them satisfies the same intuitive properties (non-negativity, addi- 
tivity), that characterize mutual information. We present here a detailed 
analysis of the different meanings and properties of these four definitions. 
We show that all these measures satisfy, at least, a weaker additivity 
condition, i.e. limited to the response set. This allows us to use them 
for analysing correlated coding, as we illustrate in a toy-example from 
hippocampal place cells. 



1 Information theory and mutual information 

Information theory £Q provides a natural mathematical framework to answer 
the question: how much information is contained in the neural patterns. Usu- 
ally, in an experiment, we choose a controlled sample of stimuli 5, and we record 
the elicited neural responses r £ 1Z when one stimulus s G S is repeatedly pre- 
sented with a known a priori probability p(s). From these data, we can estimate 
the corresponding joint probabilities p(r, s) and the probability distribution of 
responses averaged over the stimuli jp(r); and then compute the mutual infor- 
mation 1 : 



Estimating joint probabilities from experimental data needs very large samples, and it is 
often unfeasible. Various approximate methods have been proposed to overcome this issue, 
see 00. 
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(with conditional probability p(r\s) = p(r, s)/p(s) according to Bayes' rule) or, 
equivalently, introducing the entropy of a probability distribution: //(ft) = 

I(S;K) = //(ft) - H(K\S) = ]>>(*)[// (ft) - H(1Z\s)} (2) 

where H(lZ\s) = J2sesP( s )H{'R\'S) is the conditional entropy. 

Mutual information summarizes the average amount of knowledge we gain 
about the stimulus by observing neural responses (or vice- versa); e.g.: in the 
trivial case, they are completely uncorrelated, p(r, s) = p(r)p(s) and 1 = 0. 

Mutual information has some mathematical properties that agree to our in- 
tuitive notion of information. In particular, we expect that any observation does 
not decrease the knowledge we have about the system. So, mutual information 
has to be positive, as it can be easily shown starting from Shannon definition. 
Furthermore, if we observe the response from two different neurons or two dif- 
ferent aspects of single unit response, {ft 1 ,??. 2 }, the overall information about 
the stimulus set /({ft 1 , ft 2 }; S) is the sum of the information contained in the 
first response ft 1 , plus the information we gain by reading the second response 
1Z 2 given we know 1Z 1 (chain rule), i.e.: 

/({ft 1 , ft 2 }; S) = /(ft 1 ; S) + /(ft 2 ; 51ft 1 ) (3) 

Relationship @ may be used to investigate the independence of coding in cell 
population or in presence of multiple response features 0]. For example, if 
{ft 1 , ft 2 } encode different features of the stimulus independently 2 , then the in- 
formation they convey about the stimulus has to be the sum of the information 
conveys separately, i.e. /({ft 1 , ft 2 }, S}) = /(ft 1 , S) + /(ft 2 , S). Thus, by evalu- 
ating this information (or synergy function 0]) we can quantitatively estimate 
the different contributions to the stimulus encoding from the different response 
features. Note that this additivity property directly relies on the validity of the 
chain rule Similarly, due to its symmetric form, we can express the chain 
rule in terms of two different stimuli features (e.g. color and shape in visual 
stimuli) {S\S 2 } 

/(ft; {S\ S 2 }) = /(ft; S 1 ) + /(ft; S 2 ^ 1 ) (4) 



2 Note, here we are dealing with information independence. This is different a concept from 
the independence between the two different stimulus set (i.e. p(x, d) = p(x)p(d)) and also from 
conditional independence (i.e. p(x,d\r) = p(x\r)p(d\r)) . These are three distinct, although 
interconnected, measures of independence (for a detailed discussion of the topic see HIV 
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2 Stimulus specific information 



In many cases it may be interesting to know which stimulus contained more 
information in a given set 003, or investigate how a single stimulus is encoded 
in terms of different response features. In his original formulation, Shannon 
did not provide any insights about how much information can be carried by a 
single symbol, such as a single stimulus in our case. After Shannon's seminal 
work, many definitions of one-symbol specific information have been introduced, 
usually referred as stimulus specific information in the neural processing con- 
text. Ideally, stimulus specific information should be proper information in a 
mathematical sense (non-negative, additive) and give mutual information when 
averaged over the stimulus set. Four different alternative definitions have been 
proposed in literature. We give here a detailed analysis of their features and 
different meanings 3 . In particular, we investigate the different roles of stimu- 
lus and response regarding additivity rules, Eqs.10 0J|, (Section |3J) and show a 
possible application (Section^). In Table \T\we summarize the main features of 
these four quantities. 

We conclude that there is no "fully" satisfactory definition of this quantity, 
in the sense that no one of these definitions shares all mathematical properties of 
Shannon mutual information, which is so appealing from the application point 
of view; but each of them can be used to investigate different aspects of single 
stimulus information transmission. 

Stimulus specific surprise 

Originally proposed by Fano [7j, this definition can be immediately inferred 
from Eq. Q), simply taking the single stimulus contribution to the sum: 



This quantity measures the deviation (also called Kullback-Leibler distance) 
between the marginal distribution p(r) and conditional probability distribution 
p(r\s). It clearly averages to the mutual information, i.e. ^2 s& gP(s)Ii{s) — 
I(S;1Z), and it is always non- negative: Ii(s) > for s 6 S. Furthermore it is 
the only positive decomposition of the mutual information (for the proof, see 
Appendix 2 in Ref. JI]). Since h(s) is large when p(r\s) dominates in the 
regions where p(r) is small, i.e. in presence of surprising events, this quantity 
is often referred as "stimulus specific surprise" or simply "surprise" . Surprise 
lacks additivity, and this causes many difficulties when we want to apply it to 
a sequence of observations. Despite this main drawback, specific surprise has 
been widely used in neural coding literature (see for example [51 151 ITU]). 

3 We deal with possible definitions of the same quantity, so all these definitions arc usually 
referred to as "stimulus specific information". For sake of simplicity, we extend the notation 
of 1111 to all four definitions, referring to them as Ii, I2, I3 and J4. We also mention their 
original names as introduced in II 21 ll.'il . 
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Stimulus specific information 

An entropy based definition has been proposed by De Weese and Meister ^T] 
and it may be derived from Eq. @ , extracting the single stimulus contribution 
from the sum: 



h(s) = H(R) - H(1l\s) = 



E 



P(r) log 2 p(V) -p( r \s)log 2 p(r\s) 



Here, information is identified with the reduction of entropy between marginal 
distribution p(r) and conditional probability p(r\s). This quantity captures the 
reliability of the neural response for a given stimulus. Indeed, it expresses the 
difference of uncertainty between the a priori knowledge of the response set, 
H(1Z), and after stimulus presentation H(1Z\s). A stimulus characterized by 
highly predictable responses has a large h(s) value. This means that we can 
easily predict a response when we know the stimulus, but not necessarily vice- 
versa. 

As shown in this is the only decomposition of mutual information that 
is also additive, but, unlike mutual information, it can assume negative values. 

Note that any weighted combination of I\ and I2 averages to mutual in- 
formation, and it can represent a possible definition of stimulus specific infor- 
mation. Thus, we have an infinite number of plausible choices for a stimulus- 
dependendent decomposition of mutual information. But, as mentioned above, 
only l\ is always non- negative and for I2 only the chain rule 01 is fulfilled. 

Stimulus specific information (SSI) (2) 

More recently Butts |13j introduced a new definition, which emphasizes the 
causal relationship between stimulus and response in neural processing: 

h(s) = SSI(s) = 53p(r|«)/ a (r) = 53p(r|«)[F(5)-fr(5|r)] 



H(S)-J2p(r\s)H(S\r) 



where 12(f) is one-symbol specific information applied to a single neural re- 
sponse: h(r) = H(S) — H(S\r). This measure represents the average reduc- 
tion of uncertainty (difference of entropy) after an observation of the response 
given the stimulus, or, in other words, the stimulus-conditioned average of the 
response-specific information 12(f)- So, if one stimulus is characterised by a 
very informative response, in the sense of /2(f) (such as responses that almost 
completely determine the stimulus), it results in a large value of I3. 
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Definition 


Positive 


Chain rule 


Chain rule 


Average MI 




definite 


(Responses) 


(Stimuli) 




Ii Surprise 


Yes 


Yes 


No 


Yes 


h 


No 


Yes 


Yes 


Yes 


h 


No 


Yes 


No 


Yes 


h 


Yes 


Yes 


Yes/No 


No 



Table 1: Main properties of the four definitions of stimulus specific information. 



Stimulus information density 

A fourth definition has been proposed by Bezzi et al. |12| in the framework of 
position-encoding in hippocampal formation. 



h(s)=Ii(s) = {s,s}) = 

= ^2p(s)p{r\s)\og 2 



p(r\s) 



p(r) 



+ p(s)p(r\s) log 



p(r\s) 
p(r) 



(5) 
,(6) 



where s is the complementary set of s, i.e. s = Us'es s'^s s ' ( w hh s [J s — S). In 
other words, we partition the set of stimuli into two subsets, one containing the 
stimulus s only and the complementary set composed by all the other stimuli, 
then we compute the average mutual information using these two stimuli set. 
This is a measure of how well we can distinguish between the single stimulus 
s and all the others observing the neural response 1Z. I4 is an actual mutual 
information, thus it is positive and additive (this last condition holds only for 
a very particular choice of the stimuli, see below), but it does not average to 
mutual information of the whole stimulus set (i.e. X^-eS P( s )Ii( s ) I(^\T^-))- 

This quantity has a simple relationship with specific surprise I2 for very 
unlikely stimuli p(s) <C 1. Indeed, expanding Eq. © for p(s) < 1 we get: 



P{s)^2p{r\s)\og 2 



rETZ 



p(r\s) 



p(r) 



0( P (sf) 



(7) 



that corresponds to p(s)Ii (s). This expression has two consequences: it supplies 
an additional theoretical justification to specific surprise and assures that if we 
have a large set of stimuli characterized by p(s) ^ 1 for each s £ S the sum of 
the local information converges to the average mutual information. 



3 A weaker additivity condition 

Among the above four definitions for stimulus specific information, only I 2 is 
additive (obeying Eqs. (0©) But, despite not being additive in a rigor- 

ous sense, the other three quantities still fulfill relationship J2J) 4 , but not (0J, 

4 Due to space limitation, we do not report the full proofs here. In short, the proof for 
I\ follows from additivity property for entropy, for I2 and ^3 applying twice Bayes' rule to 
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or in other words, they are additive limited to the response set. This results 
in a weaker additivity property, which may still be used for investigating how 
different features of the response encode a single stimulus, such as testing in- 
dependence (see Section 0] for an example) . We should remark that due to the 
different meanings of the four definitions, this analysis can give contradictory 
results depending on the measure chosen. A special case is represented by I4, 
because it is a proper mutual information (with a particular choice of stimu- 
lus set), then we expect it to be additive in terms of stimuli, too. But, Ii(s) 
requires an additional constraint: the two subsets partition, {s,s}, of stimulus 
set S. This last condition cannot be preserved in general for the chain rule, 
Eq. Q H2|. 

4 Stimulus specific information in hippocampal 
place cells 

To show the different behavior of stimulus specific informations in a more real- 
istic setting, we consider here the example of spatial encoding in hippocampal 
place cell. 

Place cells selectively fire at an elevated rate when the animal is in a particu- 
lar location of an environment and, in some cases (e.g. linear tracks), moving in 
one specific direction. For sake of simplicity we do not consider direction here. 
Place cell firing profile is then characterized by a silent part (when the animal is 
not in the place field), and an active part corresponding to the place field. Let 
us consider an ideal experiment in which a rat is moving in a linear track in one 
direction at constant speed (from left to right in Fig.^J. The set of locations is 
the stimuli set, that we consider discrete and finite, composed by N elements. 
We indicate with x £ X a single position with p(x) — X/N. x\ corresponds 
to the place field, and we measure the firing rate of the cell r, in each spatial 
bin x. Assuming a gaussian profile inside the place field (see Fig. ^ top-left), 
and considering the cell completely silent elsewhere (i.e. neglecting fluctuations, 
Fig. G] top-right), we are able to compute analytically ii,2,3,4 and the mutual 
information. Results are summarised in Fig. [3 ^2(^1) may be negative if a is 
large enough, due to the fact that the uncertainty of the response H(lZ\x) could 
be larger than average uncertainty in the correspondence to the place field. This 
implies that place-field corresponds to the less informative stimulus according 
to I 2 measure. On the contrary, using the other definitions X\ is largely the 
most informative location in agreement with our intuitive notion that the cell 
codes for the position corresponding to the place field. 

To better illustrating the role of additivity rule (Eq. , let us consider a 
simple extension of the previous example, where two different aspects of single 
cell neural response are considered. In a similar setup, we measure the av- 
erage firing rate and the phase shift respect to a specific rhythm: the theta 

Pi^l, T2\s) and using the normalisation condition for p(r\s), J4 is an actual mutual information 
so additive. 



G 




Figure 1: A rat running in linear track (unidirectional). Position in the track is 
binned, in N spatial bins. Conditional probability distribution for a place cell: 
xi, place field (top, left) and (top, right), outside place field. 



oscillations 0. Indeed, during locomotion hippocampus is characterized by a 
high-amplitude 4-8 Hz oscillation, called theta rhythm. Theta oscillations play 
an important role in spatial encoding. Cell spiking shifts gradually to earlier 
phases of the theta cycle as the animal moves through the cell place field (theta 
phase precession') |14|. 

For sake of simplicity, in our toy-example, we take this quantity to assume 
just two values: 9 = + (say a phase shift of 270°) and 9 = — (say a phase shift 
of 90°), furthermore we associate this feature to non place field location (this is 
not in general appropriate for real place cells), see Fig- El So each cell response 
is represented by two quantities: firing rate and average phase shift, where the 
latter is a binary variable. We take finer spatial bins, in this way the place field 
is covered by two spatial bins. Place cell fires in the place field, corresponding 
to positions x\ and X2, with a gaussian profile (as shown in Fig. |31 (top-left)). 
For xi, spike times follow on average theta oscillatory maximum of 9 — + and, 
vice versa, in x-i they come earlier (6 = — ). Similarly in the other odd positions 
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Figure 2: Numerical values of stimulus specific information for different locations 
x. Left, ii,2,3,4 considering only firing rate. Place field corresponds to x = 1. 
Parameter values: N — 6, a = 2.5Hz and r = 2QHz for the place field profile. 



9 = +, and 6 = — in the other even positions, where the cell is silent. We 
neglect fluctuations and we assume that firing rate is zero in these locations. 

Overall we have 2N different positions. Assuming, as before, all the po- 
sitions are equally likely p(x) — jjy an d pifi = +) = p(0 = — ) = 1/2, we 
can analytically estimate mutual information and all the stimulus specific in- 
formation. Results are summarised in Fig. 0] As in the previous example, 
place field location corresponds to most informative stimulus according to ^1,3,4, 
but less informative for 1%. Furthermore, in this example, we can test if r 
and 9 encode different aspects of stimulus x independently or not, checking if: 
I({1Z,Q}, x) = 1(0, x) + I(1Z, x). These terms can be computed analytically 
using the assumptions above. We find that, using Zi,2,3 phase and average firing 
rate encode a single position x independentely. Since averaging these quantities 
over the stimulus set we have mutual information, independence condition is 
satisfied for mutual information. On the contrary we have redundant encoding 
using I4. This discrepancy between the behavior of Jx,2, 3 an d I 4, (and mutual 
information too) is due to the fact that, considering local information, we im- 
plicitly correlate the two response features putting together all the locations 
different from x\, so that phase and average firing rate are mixed together in 

K{ft,e}|ari) 

Note, to apply the same criterium to test encoding independence for different 
stimuli features Si, S2 (e.g. location and direction) only I2 may be used. 
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Figure 3: A rat running in linear track (unidirectional). Position in the track is 
binned, in 2N spatial bins. Conditional probability distribution for a place cell: 
x\,X2 place field (top, left) for 9 — 1,2, respectively, and £^1,2 (top, right), 
outside place field. 

5 Conclusions 

Shannon mutual information is symmetric for stimulus and response set. When 
we consider stimulus specific information, we lose this symmetry property, re- 
sulting in a weaker additivity condition, limited to the response set. So, although 
none of definitions of stimulus specific information has all the mathematical 
properties of mutual information, all of them may be used for exploring corre- 
lated encoding by multiple response features. In addition, we have shown that 
different stimulus specific information measures are suitable to answer different 
questions, and, not surprisingly, they may assume different numerical values in 
the same context. In conclusion, there is no unique definition, but the right mea- 
sure should be chosen according to the aspect of neural coding we are interested 
in studying. 
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Figure 4: Numerical values of stimulus specific information for different locations 
x- A, 2,3,4 considering firing rate and phase-shift (binary values). Place field 
corresponds to x = 1 and x = 2. Parameter values: N = 6, a — 2.hHz and 
r = 2QHz for the place field profile. 
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